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xo FIELD OF THE INVENTION 

The present invention relates to techniques for direct 
memory access (DMA) • 

BACKGROUND OF THE INVENTION 
AS depicted in Fi g ur e FIG^ 1, a typical System On Chip 
15 arrangement for direct memory access requires a CPU to 

communicate with a number of blocks (intellectual properties or 
IPs) generically called A, B, C, which can be connected together. 
In such a prior art arrangement data transfer from A to B and 
from B to C is scheduled by the CPU that monitors the state of 
20 the ongoing processes on the basis of interrupts from the blocks. 

A wide variety of possible variants of such a basic 
arrangement are known in the art. 

For instance, in US-A-4 481 578 a DMA arrangement is 
disclosed including a DMA controller connected to ^ach of a 
25 plurality of processors to facilitate transfer of bulk data from 
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one memory to the other without the intervention o£ either or 
both processors* 

In US-A-5 212 795 a programmable DMA controller 
arrangement is disclosed for regulating access of each of a 

5 number of Z/0 devices to a bus. The arrangement includes a 

priority register storing priorities of bus access from the I/O 
devices, an interrupt register storing bus access requests of the 
I/O devices, a resolver for selecting one of the I/O devices to 
have access to the bus, a pointer register storing addresses of 

10 locations in a memory for communication with the one I/O device 

via the bus, a sequence register storing an address of a location 
in the memory containing a channel program instruction which is 
to be executed next, an ALU for incrementing or decrementing 
addresses stored in the pointer register, computing the next 

15 address to be stored in the seqruence -register-and computing an 
initial contents — of -the registers. 

In US-A-5 634 099 another DMA arrangement (designated a 
Direct Access Memory Unit or DAU) is disclosed wherein the CPU 
requests a DMA by writing information relevant to the DMA to a 

20 remote processor's memory. The CPU can abort a pending DMA 

request during DAU operations by setting a skip bit in a control 
block, while an interrupt can also be sent to the CPU wherein the 
CPU is advised that a DMA request has been completed. 

In US-B-6 341 328. multiple co-pendent DMA controllers 

25 are provided to read and write common data blocks to two 

peripheral devices. As a result, only one read and one write 
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command are required for the data to be written to two peripheral 
devices . 

In nS-A-2002/0038393 a distributed DMA arrangement is 
disclosed for use within a system on a chip (SoG) , The DMA 
5 controller units are distributed to various functions modules 

desiring direct memory access. The fiinction modules interface to 
a system bus over which the direct memory access occurs. A 
global buffer memory is coupled to the system bus and bus 
arbitrators are used to arbitrate which function modules have 

10 access to the system bus to perform the direct memory access. 
Once a functional module is selected by the bus arbitrator to 
have access to the system bus, it can esteQ^lish a DMA routine 
with the global buffer memory. 

OBJECT OF THE INVENTION 

15 The object of the present invention is thus to provide 

an improved DMA arrangement overccnning the intrinsic 
disadvantages of the prior art arrangements considered in the 
foregoing. Acc o rdin g t o the inv e nti o n su c h an o bject is 
a c hieved by means o f th e m e th o d s pec ifi c ally c all e d f o r in the 

20 c laims that f o ll ow . 

SUMMARY OF THE INVENTION 
A preferred application of the invention is in 
exchanging data within a direct memory access (DMA) arrangement 
including a plurality of IP blocks. An embodiment -of the 
25 invention-provides for associating to the IP blocks respective 

DMA modules, each including an input buffer and an output buffer. 
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These DH2^ modules are coupled over a data transfer facility in a 
chain arrangement wherein each DMA module, other than the last in 
the chain, has at least one of its output buffer coupled to the 

ispsituJmffer^of^anoJ^^ chain and 

5 each DMA modules, other than the first in the chain, has its 

input buffer coupled to the output buffer of another DMA modules 
upstream in the chain. Each mm module is caused to interact 
with the respective IP block 1^ writing data from the input 
buffer of the DMA module into the respective IP block and reading 

10 data from the respective IP block into the output buffer of the 
UlA module. The input and output buffers of the DMA modules are 
operated in such a way that: - writing of data from the input 
buffer of the DMA module into the respective IP block is started 
when the input buffer is at least partly filled with data; ^ when 

15 reading of data from the respective IP block into the output 
buffer of the DMA module is completed, the data in the output 
buffer of the IMA module are transferred to the input buffer of 
the DMA module downstream in the chain or, in the case of the 
last DMA module in the chain, are provided as output data. 

20 A particularly preferred embodiment of the invention 

provides for associating to the output buffers and input buffers 
coupled in the chain at least one intermediate block to control 
data transfer between the buffers coupled to each other. 
Transfer of data between the coupled buffers over the data 

25 transfer facility is then controlled by issuing at least one 

request of a requesting buffer for a buffer coupled therewith to 
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indicate at least one transfer condition out of i) data existing 
to be transferred and ii) enough space existing for receiving 
said data when transferred. At least one corresponding. 

acknc9i7.1.edgiement_is_then^is.sju^^^ buffer 

5 confirming that the at least one transfer condition is net and 

data are transferred between the requesting buffer and the buffer 
coupled therewith. The data treuisfer facility (BUS) between the 
two coupled buffers is thus left free between the request and the 
acknowledgement • 

10 A CPU may included in the arrangement considered for 

transferring data to be processed into the input buffer of the 
first DMA module in the chain, and collecting the output data 
from the output buffer of the last Btm module in the chain. The 
CPU may also be used for configuring the TXtm modules. 

15 The invention also includes architecture of a module 

for implementing the method referred to in the foregoing as well 
as a computer program product directly loadable into the memory 
of a digital computer and including software code portions for 
carrying out the method of the invention when the product is run 

20 on a computer. 

A preferred embodiment of the invention (that can be 
referred to as "intelligent" direct memory access or IDMA) has 
been developed to provide a reusable wrapper with DMA 
capetbilities for hardware IP blocks. The aim is to realize e.g. 

25 a System On Chip prototyping system structured as a bus system 
where all the IPs are accessible through the system bus. An 
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embodiment o£ the IDMA architecture presented herein can also be 
used as the £inal (solution in integrated Systems On Chip. 

A significant feature of the preferred embodiment of 
th£L^IiaiA_jdesc.rihedJiexjBixi^ 

5 on the bus. This improves overall performance of a SoC where 
several IDHA+IP blocks are used. 

BRIEF DESCRIPTION OF THE DRAWING 
An embodiment of the invention will now be described by 
way of example only, by referring to the annexed views, in which: 
10 [[-]] 

Fi g u re FIG. 1, has been already described in the 
foregoing; [[-11 

Fi g u re FIG. 2 is a block diagram illustrating a System 
on Chip architecture for intelligent DMA (IDMA); [[-]] 
IS Fi g u r e FIG. 3 shows a typical exaiqple of System on Chip 

design flow; [[-]] 

Fi g u re FIG. 4 is a block diagram of an embodiment of 
IDMA module architecture; [[-]] 

Fi g u re s FIGS. 5 and 6 are further detailed block 
20 diagrams of parts of the embodiment of Fi g u r e FIG. 4; 

Fi g ure FIG. 7 is a flow chart illustrating certain 
instruction flows within an embodiment of the invention; 

Fi g ure FIGS. 8 and 9 are additional detailed block 
diagrams of other parts of the embodiment of Fi g ure FIG. 4; 
25 Fi g u re FIGS. 10 shows a possible corresponding memory 

organization; and [[-]] 
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Figur e FIG> 11 shows a specific embodiment of the 
system of Fi g ur e FIG. 2. 

SPECIFIC DESCRIPTIOM 
The exem plary architecture shown in Fi g ur e FIG. 2 is a 



5 bus- based system including a bus (BUS) as a basic data transfer 
facility. The architecture also includes a CPU, a memory block 
MEM plus a plurality of IP's designated A, B, C. These IPs are 
accessible through the system bus via respective "intelligent" 
DMA modules (iDMAs) designated IDMA A, IDMA B, and IDMA C, 
10 respectively. 

Each irafA module (hereinafter, briefly, IDMA) can be 
described as a respective version of a single VHDL core suitable 
to be easily adapted to different IPs by modifying a given set of 
parameters. This is a point that makes IDMA a suitable solution 
15 for fast prototyping. 

To understand the role of the IDMA in SoC design one 
may refer to Fi g ure FIG. 3 that represents the typical design 
flow of a SoC. 

Starting from system specification 100 based on 
20 literature 101 and doctiments 102, an early system simulation step 
103 is realized in a bit true style e.g. based on a Matlcib 104 or 
C/C-i-4- description 105 to obtain results as close as possible to 
the final implementation. 

After this step, a partition 106 must be effected in 
25 order to identify those modules 107a that will be realized with 
customized hardware parts (generally third party IPs) and those 
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modules 107b that will be .implemented as a software routine by 
an embedded processor. 

Generating the hardware modules 107a requires steps 

such as hardware architecture exploration 108, followed by 

5 hardware synthesis 109 and technology mapping 110, the steps 108 
to 110 being carried out with IPs both of the "soft" type 112 and 
the "hard" type 114. 

Similarly, generating the software modules 107b 
requires software description 116, C code translation 118 and a 
10 microprocessor development system 120. 

The partition 106 must be verified and tested on a 
HW/SW prototyping step, designated 122 before proceeding, in step 
124 to SdC realization proper. 

Many partitions might be prototyped before defining the 
IS best one. This may require a very long time if the prototype is 
not fast. 

In fact one of the critical issues of prototyping (and 
developing) SoC is interfacing HW and SW parts through a system 
bus in a general arrangement as shown in Fi g ure FIG. 1. 

20 Generally, IPs such as IPs 112 and 114 have very simple 

interfaces that must be adapted to the bus. This affects the 
timing and the logical meanings of the signals. Also, the 
schedule of the IP must be controlled and this iinplies a complex 
activity on the System On Chip controller. 

25 If the scheduling of different blocks is not properly 

organized, the bus may be congested. Consequently, "interfacing" 
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a new IP to the system bus may require a very long time in terms 

of new design cmd interfaces development. 

The XBKK architecture generally indicated 10 in Fi g ur e 

FIG. 4 is particularly adapted for overcomina the drawba cks . that 
5 are inherent in prior art arrangements. To that effect, it 

includes input and output buffers 11 and 12 to generate the input 

data to the respective IP (not shown) and receive therefrom 

corresponding IP output data, respectively. 

The buffers 11 and 12 cooperate with a reprogrammable 
10 FSM- (Finite State Machine) such as a RAM based FSM (briefly RBF) 

13 of a known type that manages the IP controls. The RBF 13 is 

activated and controlled by a UCU (Main Control Unit) 14. 

References 15 and 16 designate two master blocks, 

designated "master- in" and "master-out" blocks, respectively. 
15 The master blocks 15 and 16 permit data communication from a 

system bus 17 -preferably patterned after advanced microcontroller 

bus architecture (AMBA)- to the input buffer 11 and from the 

buffer 12 to the bus 17. Reference 18 designates an AMBA AHB 

(Advanced High- performance Bus) slave interface. 
20 Reference 19 designates an internal register 

(Instruction Register or IR) interposed between the MCn 14 and 

the slave interface 18. 

Finally, reference 20 designates an interrupt 

controller (INT CTRL) activated by the MCU 14 to generate 
25 interrupts (up to 16, in the embodiment shown) The IDMA 10 has a 
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very flexible IP interface that C€ui be easily adapted to 
different IPs. This reduces the time needed to connect a new IP. 
The IDUA 10 .can be configured according to the IP 

requirements. This feature elimina te s the time ne eded to_ build a 

5 custom logic to drive a new IP. 

The IDMA 10 has different master and slave bus 
interfaces. This feature eliminates the time required to build a 
particular slave or master interface. Thanks to these features 
the IDMA reduces the time required to set up a new prototype. 
10 The IDMA architecture 10 is very simple and flexible, 

which makes it particularly suitable to be expanded to 
accommodate, e.g. new bus and IP interfaces. 

The core description is technology independent; it can 
be used on different prototyping systems and also on the final 
15 implementation. In the presently preferred embodiment, the core 
was developed in VHDL but it can be exported in a VERIZiOG 
project. 

The description will now refer to a situation 
encountered by a designer implementing a design flow as shown in 
20 Fi g ur e FIG. 3. When wishing to define a new partitioning with 

new IPs, in order to connect to the bus through the IDMA 10, the 
designer will just need to perform a few operations. 

As a first step, a VHDL wrapper will have to be created 
in order to connect the IP and the IDMA 10. 
25 To that effect, a suitable set of parameters (e.g. a 

set of twenty integer values) is chosen according to the 
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application requirements. Preferably, creation o£ the VHDL 

wrapper and insertion o£ the application parameters are performed 

with the. support of . suitable software tool. 

The IDMA/IP core is then included in the de sign^ while 

5 programming the TJMk run time and running the prototyping 

emulation. 

In the embodiment shown, a virtual channel is created 
between those blocks that must be directly connected together. 
The blocks may then activate data transfer between them only when 
10 necessary and without any CPU usage. This feature makes the IttSA 
shown herein eui "intelligent" one. 

The virtual channel is realized by resorting to the two 
buffers 11 and 12 (for the data input to the IP and the data 
output from the IP, respectively) that allow a logical separation 
15 between the IP and the bus. 

The raiA characteristics are obtained by means of the 
master block (s) 15 and 16 associated with the input buffer 11 
and/or the output buffer 12. 

This feature leads to the core becoming able to access 
20 the bus directly; the input and output buffers 11 and 12 are 

intended to store data to and from the IP and to optimize access 
to the bus. 

Also, it will be appreciated that providing two 
separate master blocks (masterin 15, masterout 16) within the 
25 architecture of Fi g u re FIG. 1 may represent a preferred choice in 
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terms o£ flexibility. However just one block is usually 

activated in each XWOl. 

In the following, the various blocks coiqprising the 

ar chi teGfeugLe_10_ mll b e d eJballfid.^ 

5 Architecture 10 is generally intended to co-operate 

with "internal" registers not es^licitly shown in Fi g ur e FIG. 4. 

Each of these registers is implemented in a particular module 

according to its function cuid is accessed through the system bus. 

There are e.g. 33 registers whose width varies between 32 and 1 
io bit. Nevertheless they are mapped in the memory pl€ui on 32 bit 

word aligned addresses. Write and read access availability may 

change according to different implementations for each register. 

The general input-output layout of the input buffer 

(inbuffer) 11 is shown in Fi g u r e FIG. 5. 
15 Preferably, this block is realized as a FIFO memory. 

Input data come from the system bus 17 and output data 

are sent to the IP input interface. When data are stored in the 

inbuffer the IDMA 10 can download them to the IP while the system 

bus is free. Writing and reading operations can take place 
20 simultaneously, thus allowing data to be downloaded to the IP 

while the bus 17 is filling the buffer. 

A preferred feature of this block is that the input 

data are 32 bit wide while the width of the output to the IP can 

be programmed run time by the user. This implies a significant 
25 optimization of the bus activities. For instance, if an IP 

requires 64 bits to start processing and has a 1 bit wide port 
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the user can download all the data with 2 bus cycles in the input 
bu££er 11. Then the IiaiA 10 sends one data item per clock cycle 
to the IP while the system bus can be used £or other purposes. 

3ghe-^nputL,bu£Jer_ll,-la^lways aac^jsjifid^throuflh-the same 

5 bus address (it is a FIFO memory and has always the same entry 

point). In this case, the FIFOWR signal is driven high while the 
data is caused to strobe on the FIFODIN port. Be£ore storing the 
data, the total amount o£ bits to be loaded must be communicated 
to the inbu££er driving high the BTN_VAL signal and driving the 

10 corresponding value on the BTN port. This allows the input 

bu££er 11 to organize the data when the total amount is not a 
multiple o£ 32 bits. The writing operation can be driven by the 
masterin module 15 or by external components through the slave 
interface module 18. In the memory organization, the BTN port is 

15 heoidled as an internal register. 

When reading the data the FIFOOE signal must be driven 
high. The data are available through the FIFO DOUT port when the 
DOUTVAli signal is driven high. FIFODOUT is connected directly to 
the IP. The RBF 13 controls FIFOOE and DOXTFVAL. 

20 FIFODIMENSION port and DATA SIZE can be used to 

configure the input bu££er 11. DATA SIZE indicates how many bits 
must be read at the same time. FIFO DIMENSION is used to 
con£igure the size o£ the bu££er in terms o£ 32- bit words. The 
value driven on FIFO DIMENSION cannot exceed the physical size o£ 

25 the inbu££er. These values can be configured by external 
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components through the SLAVE^INTERFACE port. FIFODIMENSION and 
DATA SIZE corresi)ond to internal registers. 

The VALID BIT (VBI) and BITINPIFO ports provide 

inf orxna t iQn.^r^gardi3agLAhe--int ernal status of the bu££er. 

5 In particular VALIDBIT indicates how many bits are 

available for reading and BITINFIFO indicates how many bits are 
physically present in the inbuffer 11. 

These values are not always identical because of the 
internal organization of the data. These values can be read by 
10 the other modules in the IX»XA 10 and also by an external 
component through the slave interface module 18. 

This feature allows external modules to obtain 
information as to the buffer status and decide whether or not 
download data to the buffer. VALID-BIT and BITINFIFO correspond 
15 to internal registers. 

The entire contents of the input buffer 11 can be read 
and written through a RAM port without changing the internal 
status of the FIFO pointers. This feature allows- debug 
operation on the buffer. The RAH port is accessible through the 
20 slave interface module 18. 

The general input -output layout of the output buffer 
(outbuffer) 12 is shown in Fi g u r e FIG. 6. 

This block is again realized as a FIFO memory. Input 
data come from the IP and output data are sent to other modules 
25 through the system bus 17. When data are driven out from the IP 
the IDMA 10 can write them in the output buffer 12 while the bus 
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is free. Writing and reading operations can occur at the same 
time allowing data to be downloaded from the IP while the bus 17 

is reading the buffer 12. 

. An advan tageoufi feature o f this b lock is fchati tihe Inpnfc 

5 data width can be programmed run time by the user while the 
output data width is e.g. 32. This implies a significant 
optimization of the bus activities. For instance, if an IP 
provides a 2 -bit wide output, it will require 32 clock cycles to 
provide 64 bits that can be read through the bus in 2 clock 
10 cycles • 

Vlhen writing data the FIFOWR signal must be driven high 
while data are driven on the FIFODIN port. FZFODIN is connected 
directly to the XP. The RBF 13 controls FZFOWR. 

The output buffer 12 is read through the same bus 

15 address (it is a FIFO memory and has always the same output 

point). In this case the FIFOOE signal is driven high and data 
are strobed on the next clock cycle. In the embodiment shown, 
the output buffer read operation is always 32 bits wide. The 
reading operation can be driven by the masterout module 16 or by 

20 external con^onents through the slave interface 18. 

FIFODIMENSION port and DATA SIZE can be used to 
configure the output buffer 12. DATA SIZE indicates how many 
bits must be read at the same time. FIFO DIMENSION is used to 
configure the size of the buffer in terms of 32- bit words. The 

25 value driven on FIFO DIMENSION does not exceed the physical 
dimension of the output -buffer 12. — These values can be 
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configured 1:^ external components through the SZ«AVE- INTERFACE 
port. FIFODIHENSION and DATASZZE correspond to internal 
registers.-. __ _ 

The VALID BIT (VBO) and BITINFIFQ ports provide 

5 information regarding the internal status of the buffer. 

Specifically, VALIDBIT indicates how many bits are 
available for reading and BITINFIFO indicates how many bits are 
physically present in the output buffer 12. These values are not 
always the same because of the internal organization of the data. 

10 These values can be read by the other modules in the imA 10 as 
well as by an external component through the slave interface 
module 18. Again, this feature allows the external modules to 
know the buffer status and decide whether or not download data to 
the buffer. VALIDBIT and BITINFIFO correspond to internal 

15 registers. 

The entire contents of the output buffer 12 can be read 
and written through a RAH port without changing the internal 
status of the FIFO pointers. This feature allows debug operation 
on the buffer. The RAM port is accessible through the slave 
20 interface module 18. 

The module 13 is a finite state machine (FSM) that 
drives operation on the IP. 

Its main role is to take data from the input buffer 11, 
download them in the IP, receive output data from the IP and 
25 store them in the output buffer 12. 



- 16 - 



23301 SN 10/535,476 



Corrected verBion 



As these operations can vary run time, especially in 
prototyping systems, this FSM must be progrcumnable run time. For 
this reason it is pre£er€JDly realized (in a manner known per se) 

with-RAM memories that can hft wr i tten thiTQugh t he gystem bug> 

5 Each RAH address contains a description of one finite state, with 
all the possible state transitions and the output values. The 
RBF 13 is activated and controlled hy the KCU 14. When the RBF 
13 flow finishes (if a finish state is defined) the RBF 13 can 
drive high the RBFFINISH to the MCU 14 • Any implementation of -a 

10 reprogrammable FSM can be used. 

In the embodiment shown, the AMBA/AHB slave interface 
18 is a standard AMBA slave interface that permits access to the 
internal registers, the RBF 13 and the buffers 11 and 12. It 
provides some control on the address values to verify and give 

isr error responses if necessary. In the presently preferred 

embodiment, the whole imiA addressing space is 64 Mlqftes. The 
base address (IDMA Base Address or IBA) can be changed run time 
or can be fixed. 

The MCU 14 controls the overall functionality of the 

20 liaiA 10. It generates commands towards all the other blocks and 
provides external information to the system using interrupts via 
the module 20. Preferably, the MCU 14 is realized as a FSH that 
executes instructions . 

The instructions are loaded by the user in the internal 

25 register 19 called instruction register or IR. 
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The core instruction is the GO instruction whose flow 
is depicted in Fi g ure FIG> 7. The main purpose of the GO 
instruction is to activate the flow of the RBF 13. 

- JSQJE^ttAlly^hB^JB^^ the 

5 input buffer 11 to the IP and from the IP to the output buffer 

12. When executing a GO instruction (starting from a GO RECEIVED 
200) the MCn 14 checks the status of buffers 11 and 12 (VBI and 
VBO), before and after data processing 201. In the comparison 
steps 202 and 203-204 that follow the start step 200 and the 
10 processing step 201, the VBI (Valid Bit Inbuffer) and VBO (Valid 
Bit Outbuffer) contents are compared with expected values. These 
values are stored as EBI (Eacpected Bit Inbuffer) and EBO 
(Expected Bit Outbuffer) values. 

In particular the RBF flow is enabled, thus leading to 
15 the processing step 201 via a first interrupt 205 only if VBI 
(Valid Bit Inbuffer), is equal or greater than EBI. 

After the processing step 201, in a step 206 the MCU 14 
also checks the RBFFINISH signal to ascertain whether the RBF has 
finished its flow. Feedback is given to the system with specific 
20 interrupts. This allows other modules to. 

access- the-lDHA — buffers only under particular- condition. The 
complete GO instruction flow is detailed in Fi g ure FIG. 7, where 
the steps 207 to 211 designate other interrupts. 

Normally, if all the data have been processed and all 
25 the output results have been produced, the process should end 
with interrupt 3, that is step 208. 
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The table reproduced herelnbelow details the meaning of 



the interrupts in question. 



_ TMTIPnBTTO'P — 


— MFAlJTMfl - 


X (8tw 205) 


Data prQCesi?ing ha? begw..as there 




are enough data in the inbu££er 


2 (step 207) 


No data processing is executed as 
there are not enough data in the 

inbu££er 


3 (step 208) 


Data processing finished. The 
outbu£fer 12 is full and the inbuffer 
11 is empty, i.e. VBI^O (step 212) 


4 (step 209) 


Data processing finished. The 
outbuf f er 12 is not full and the 
inbuffer 11 is empty 


5 (step 210) 


Data processing finished. The 
outbuf fer 12 is full and the inbuffer 
12 is not empty. 



The "GO" instruction is downloaded into each imui £rom 
10 the CPU and enables the respective IDMA to perform a single IP 
process • 

In order to ensure that the iDMAs are always active 
without CPU commands, each IDMA develops an extension of the "<30" 
instruction ("GiOAL" instruction), that substantially corresponds 
15 to the flowchart of Fi g ure FIG> 7 with the additional provision 
of return paths leading from each of the (end) interrupts 
207,208, 209,210, and 211 back to the input of the comparison 
step 202. 

When the process is finished, the IDHA 10 always polls 
20 the VBI value to determine when a new process can start. 
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Meanwhile the master in module 15 (or the coupled 
master out block 16 of another IDMII) can store data in the input 
buffer 11. Once the data are available a new RBF processing is 

actiyated.. 

5 The interrupt controller 20 is a very simple FIFO that 

receives interrupts from the HCU 14. Each interrupt arriving 
from the MCU 14 is stored in the FIFO. 

When the FIFO is not empty, one of the interrupt 
signals is driven high to be recognized by an external device* 
10 The user can configure the association between the interrupt 
level and a particular interrupt pin. 

Vlhen the external device receives the interrupt it can 
access through the bus the interrupt FIFO 20 to read which 
interrupt has been produced. When the interrupt FIFO is empty, 
15 no interrupt is asserted. It is possible to mask some 

interrupts. In this case the interrupt is not stored in the 
FIFO. 

The basic layout of the masterin block or module 15 is 
shown in Figur e FIG. 8. The programmable masterin block (along 
20 with the masterout block 16) is the core of the XlttSA 

functionality. Its purpose is to upload data through the bus 17 
and to download them in the input buffer 11. It is activated by 
the MCU 14 through an ENABZjE signal. 

As better explained in the following, the input buffer 
25 11 is intended to be (virtually) coupled to the output buffer of 
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another iDHA-not shown in Figur e FIG. 4-located "upstream" in the 
general layout of Figu r e FIG> 2. 

This preferably occurs either via the respective 
._XDas^erin. block 15 or via the magterout bloek Ifi agBOf!TAfcftA h<n the 
5 output buffer of the imA located "upstream". 

When coupling is obtained via the respective masterin 
block 15, such mastering block 15 tries to fetch data from such 
coupled output buffer of another XDHK^ this occurs only if the 
valid bit in the input buffer 11 is less than the value driven on 
10 the XIEFERENCE VALUE port* This control improves bus occupation 
and system performance. 

The fetch operation is structured in three parts. 
At first, the masterin block-l-5-sends a request-to the 
coupled output buffer of another imSA to knov if there are enough 
15 data to load. This request occurs through the bus 17 in one 

clock cycle. The total amount of data to be loaded is determined 
by the value on the BIT TO TRANSFER port. 

Then, the output buffer of the other IDNA being 
questioned sends back an acknowledgement when the data are 
20 available. This operation occurs through the bus 17 in one clock 
cycle. Between the request and the acknowledgement the bus is 
kept totally free. 

Finally, when the acknowledgement is asserted, the 
masterin block 15 proceeds to the transfer between the two ZDHAs 
25 involved. The process of request /acknowledgement is a 
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significant point in creating a virtual channel between two 
IDMBIS . 

^ - Thanks to this feature, the control CPU does not have 

t£LJC-Qntrol the IDMA behavior and it can run other procedures. 

5 The basic layout of the masterout block or module 16 is 

shown in Fi g ur e FIG> 9. The progranznable master out block 16 
(along with the master in block 15) is the core of the xmm 
functionality. Its purpose is to upload data from the output 
buffer 12 and download them through the bus 17* It can access 
10 other convonents in the system only under particular condition. 
It is activated by the MCU 14 through an ENABLE signal. 

As indicated previously coupling between the input 
buffer of an IDMA and the output buffer of another iraiA arranged 
"upstream" in the chain can also be achieved via the masterout 
15 block 16 associated with such output buffer. 

In that case, the masterout block 16 in question, when 
enabled, tries to store data in the coupled input buffer of said 
another IDMA 10 only if the valid bits in the output buffer 12 
are more than the value driven on the BIT TO TRANSFER port. This 
20 control iniproves bus occupation and system performances. 

-Again, -the store operation is — structured in three 

parts. 

At first the master out block 16 sends a request to the 
coupled in buffer of the other IDMA to know if there are 
25 locations enough to store the data. This request occurs through 
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the bus 17 in one clock cycle. The total amount o£ data to be 
stored is determined by the value on the BITTOTRANSFER port. 
- -Then,^the . input bu££er o£ ~the other IDHA -being 

5 memory locations requested is available. This operation occurs 
through the bus 17 in one clock cycle. Between the request and 
the acknowledgement the bus is totally free. 

Finally, when the acknowledgement is asserted, the 
masterout block 16 proceeds to the transfer. The process o£ 
10 request /acknowledgement just described is a significant point in 
order to create a virtual channel between two iDMI^s. 

Thanks to this feature the control CPU does not have to 
control the IDMA behavior and it can run other procedures. 

All the internal memory resources of the iVOK 10 
IS (internal registers and buffers) are mapped in the memory 

starting from the IBA (Idma Base Address) . The IBA is chosen at 
synthesis time and can be changed if necessary run time. The 
memory plan is preferably as shown in Fi g ure FIG. 10. 

Fi g u re FIG. 11 essentially details the basic scheme of 
20 Fi g ur e FIG. 2 by highlighting the presence of input buffers (llA, 
IIB, lie), output buffers (12A, 12B, 12C) and masterout block 
(16A, 16B) in the iDMAs designated IDMA A, IDMA B, and IDMA C. 

There, the IDMA A and XVUA B have respective masterout 
blocks 16A, 16B that couple the associated output buffers 12A, 
25 12B with the input buffers llB, IIC of IDMA B and IDMA C. 
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Specifically, the output buffer 12A is coupled via the 
xnasterout-block 16A-with-the input buffer llB and the output 
buffer 12B is coupled via the masterout block 16B .with the input 

. _ buffer lie, 

5 In such a chain arrangement each ZDMA module has: [[-]] 

a) its output buffer (12A, 12B) coupled to the input 
buffer (IIB, lie) of another IDMA module located "downstream" in 
the chain; and/or [[-]] 

b) its input buffer (llB, IIC) coupled to the output 
10 buffer (12A, 12B) of emother ZDMA module located "upstream" in 

the chain. 

Specifically, the IDMA A module fulfils only condition 

a); the IDMA B module fulfils both conditions a) and b) ; and the 

IDMA C module fulfils only condition b) • 
15 Operation of the transmission chain shown in Fi g ure 

FIG. 11 is organized in several steps, and in fact only the three 

initial steps involve the CPU. 

At configuration, after system start up, the CPU 

configures all the iDMAs, It downloads the RBF programs and all 
20 the default values for the internal registers as well as the 

writing address values for the output buffers are initialized to 

couple the master out block 16A with the input buffer IIB and the 

master out block 16B with the input buffer IIC. 

To activate the iDMAs, a "GOAL" instruction (as 
25 described in the foregoing- see also Fi g u re FIG> 7) is transmitted 

to every IDMA. 
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The CPU transfers the data to be processed into the 
input bu££er 11A« The RBF 13 in IDM21 A begins to treuisfer data 
to IP A and writes output data from the IP A in the output buffer 

12A.. ^_ 

5 As soon as the output buffer 12A is filled with data, 

the masterout block 16A transfers the data to the input buffer 
IIB : this occurs after performing the request /acknowledgement 
procedure described in the foregoing with the input buffer IIB. 

As soon as the input buffer IIB is filled with data, 
10 the RBF 13 in imiA B transfer the data to IP B and writes -data 
from IP B in the output buffer 12B. 

As soon as the output buffer 12B is filled with data 
the master out block 16B transfers the data to the input buffer 
lie. Again, this occurs after a request /acknowledgement 
15 procedure with the input buffer 11C« 

As soon as the input buffer IIC is filled with data, 
the RBF 13 in IZ^ C transfer the data to IP C and writes data 
from IP C in output buffer 12C. 

An interrupt is sent to the CPU to indicate that valid 
20 data are available in the output buffer 12C. 

The system loops through the steps exenvlif ied in the 
foregoing until all the data are processed. 

It is evident that this is just a possible 
implementation of an IDMA based architecture, which in fact may 
25 include any number of IDMAs. Several possible variants can be 
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easily conceived, such as for instance adding a masterin block to 

iniSA A and/or a master out block to IDMA C. 

Of course, without prejudice to the underlying 

princi ple of the invention, the details and embodiments may vary, 

5 even significantly, with respect to what has been described by 

way of exaxqple only without departing from the scope of the 

invention as defined by the annexed claims. 
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